Privacy Preserving Record Linkage via grams Projections
نویسندگان
چکیده
Record linkage has been extensively used in various data mining applications involving sharing data. While the amount of available data is growing, the concern of disclosing sensitive information poses the problem of utility vs privacy. In this paper, we study the problem of private record linkage via secure data transformations. In contrast to the existing techniques in this area, we propose a novel approach that provides strong privacy guarantees under the formal framework of differential privacy. We develop an embedding strategy based on frequent variable length grams mined in a private way from the original data. We also introduce personalized threshold for matching individual records in the embedded space which achieves better linkage accuracy than the existing global threshold approach. Compared with the state-of-the-art secure matching schema [23], our approach provides formal, provable privacy guarantees and achieves better scalability while providing comparable utility.
منابع مشابه
Privacy-preserving record linkage using Bloom filters
BACKGROUND Combining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns. METHOD...
متن کاملPrivacy preserving interactive record linkage (PPIRL)
OBJECTIVE Record linkage to integrate uncoordinated databases is critical in biomedical research using Big Data. Balancing privacy protection against the need for high quality record linkage requires a human-machine hybrid system to safely manage uncertainty in the ever changing streams of chaotic Big Data. METHODS In the computer science literature, private record linkage is the most publish...
متن کاملPrivacy Preserving Group Linkage
The problem of privacy-preserving record linkage is to find the intersection of records from two parties, while not revealing any private records to each other. Recently, group linkage has been introduced to measure the similarity of groups of records [19]. When we extend the traditional privacy-preserving record linkage methods to group linkage measurement, group membership privacy becomes vul...
متن کاملQuantifying the correctness, computational complexity, and security of privacy-preserving string comparators for record linkage
Record linkage is the task of identifying records from disparate data sources that refer to the same entity. It is an integral component of data processing in distributed settings, where the integration of information from multiple sources can prevent duplication and enrich overall data quality, thus enabling more detailed and correct analysis. Privacy-preserving record linkage (PPRL) is a vari...
متن کاملPrivacy Preserving Probabilistic Record Linkage (P3RL): a novel method for linking existing health-related data and maintaining participant confidentiality
BACKGROUND Record linkage of existing individual health care data is an efficient way to answer important epidemiological research questions. Reuse of individual health-related data faces several problems: Either a unique personal identifier, like social security number, is not available or non-unique person identifiable information, like names, are privacy protected and cannot be accessed. A s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1208.2773 شماره
صفحات -
تاریخ انتشار 2012